Spatio-temporal data prediction of multiple air pollutants in multi-cities based on 4D digraph convolutional neural network

In response to the problem that current multi-city multi-pollutant prediction methods based on one-dimensional undirected graph neural network models cannot accurately reflect the two-dimensional spatial correlations and directedness, this study proposes a four-dimensional directed graph model that can capture the two-dimensional spatial directed information and node correlation information related to multiple factors, as well as extract temporal correlation information at different times. Firstly, A four-dimensional directed GCN model with directed information graph in two-dimensional space was established based on the geographical location of the city. Secondly, Spectral decomposition and tensor operations were then applied to the two-dimensional directed information graph to obtain the graph Fourier coefficients and graph Fourier basis. Thirdly, the graph filter of the four-dimensional directed GCN model was further improved and optimized. Finally, an LSTM network architecture was introduced to construct the four-dimensional directed GCN-LSTM model for synchronous extraction of spatio-temporal information and prediction of atmospheric pollutant concentrations. The study uses the 2020 atmospheric six-parameter data of the Taihu Lake city cluster and applies canonical correlation analysis to confirm the data’s temporal, spatial, and multi-factor correlations. Through experimentation, it is verified that the proposed 4D-DGCN-LSTM model achieves a MAE reduction of 1.12%, 4.91%, 5.62%, and 11.67% compared with the 4D-DGCN, GCN-LSTM, GCN, and LSTM models, respectively, indicating the good performance of the 4D-DGCN-LSTM model in predicting multiple types of atmospheric pollutants in various cities.


Introduction
In recent years, with the increasing level of industrialization and urbanization in China, energy consumption has risen sharply.The primary energy consumption, mainly coal, oil, and natural gas, accounts for as much as 84.7% of the total.The extensive use of fossil fuels has directly led to poor air quality and high concentrations of atmospheric pollutants in economically developed regions [1].According to the Environmental Air Quality Standards prescribed by the Chinese government, there are six basic atmospheric pollutants, which include particulate matter and gaseous pollutants [2]: PM2.5, PM10, NO2, SO2, CO, and O3.These atmospheric pollutants have been identified as the world's biggest environmental health risk factors by the World Health Organization.In the past decade, various provinces and cities in China have experienced extreme low visibility and severe air pollution events, and the type of atmospheric pollutants has changed from industrial smoke to finer particulate matter such as PM2.5 and PM10 [3].Air pollution not only directly harms human health and hinders human social activities but also has negative impacts on natural ecosystems.For instance, SO2 causes soil acidification and forest degradation, PM2.5 and PM10 reduce atmospheric visibility, O3 causes significant decomposition of the global climate, and atmospheric pollutants carrying nitrogen and phosphorus elements through dry and wet deposition cause eutrophication of lakes and rivers [4].Therefore, predicting the concentration of atmospheric pollutants is of great necessity.
Currently, there are two main types of predictive modeling methods for atmospheric pollutant concentration: mechanism-based atmospheric pollutant predictive modeling and datadriven atmospheric pollutant predictive modeling [5][6][7].Mechanism-based atmospheric pollutant predictive modeling is based on atmospheric dynamics and environmental chemistry [8,9].It constructs mathematical models using equations to analyze the spatial and temporal distribution and diffusion of pollutants based on atmospheric pollutant emission source data and meteorological data [10,11].The model is then used to calculate and solve the concentration and distribution of future pollutant changes and make predictions [12][13][14].For example, Liu Xiaoyong et al. constructed a Flexpart atmospheric pollution spatiotemporal diffusion model to analyze the characteristics of atmospheric pollution in the industrial cities of Beijing-Tianjin-Hebei [15].Chen Jingfeng et al. used a Gaussian plume model to analyze the diffusion law of PM2.5 in Xi'an [16].Yu Xiaomeng et al. used three commonly used Lagrangian atmospheric pollution diffusion models to provide further research and reference directions for environmental pollutant prediction and simulation [17].Liu Gang et al. used nonlinear dynamics to examine the spectral distribution of atmospheric pollutant concentration time series, the fractal dimension of the time-varying curve, and the correlation dimension of the attractor in phase space for the first time [18].However, the production and concentration of atmospheric pollutants are influenced by many factors, such as the spatial and geographical location of the pollution source and regional meteorological conditions, making it difficult to accurately establish the system structure.Therefore, relying solely on mechanism-based modeling methods is not suitable for predicting the concentration of atmospheric pollutants.
Data-driven atmospheric pollutant prediction models are built on the theories of data analysis and data mining.These models do not rely on the generation and diffusion mechanisms of pollutants; they do not depend on related physical, chemical actions, and biological processes.They are based on statistical methods and analyze the concentration changes of atmospheric pollutants and related influence factors to provide future predictions of atmospheric pollutant concentrations within a certain period [19,20].For example, Liu Xiaoming et al. used the VGG model to predict the concentration of PM2.5 data in the form of K-line charts, which can obtain richer information about the local variations of PM2.5 concentrations [21].Hongbin Dai et al. proposed an Improved PCA-MEE and ISPO-LightGBM model to evaluate the hazard risk of haze in multiple cities, such as Xi'an [22].Huang Guangqiu et al. proposed the construction of an XGBoost-MLP based on GARCH models to predict PM2.5 concentrations and fluctuations in different regions [23].Compared to mechanism-driven methods, data-driven predictions have lower costs and offer significant advantages, especially in shortterm multi-frequency predictions.However, most existing atmospheric pollutant concentration predictions are based on single data-driven models, which may not provide accurate results [11,24,25].
The massive, high-dimensional, spatiotemporal evolution, and multi-factor characteristics of multi-city atmospheric pollutant concentration data conform to big data features.Therefore, deep learning methods suitable for big data analysis should be applied for prediction [26].Traditional recurrent neural networks and long short-term memory networks, when dealing with time series data, capture the short-and long-term dependencies among elements in the time series, providing guidance for the calculation of elements that are sorted later in the sequence [27].However, the diffusion of atmospheric pollutants in multiple cities is influenced by dynamic and thermal factors, exhibiting spatial correlations [28].Therefore, when studying urban atmospheric pollutant concentrations, it is necessary to consider the spatiotemporal correlations among surrounding cities' atmospheric pollutant concentrations and adopt an appropriate method for modeling spatiotemporal sequences with multiple factors.
Graph convolutional neural networks (GCNs) are modeling frameworks based on graph theory that utilize deep learning for graph data, such as nodes and edges, describing pairwise relationships between nodes.Some cutting-edge models have been proposed based on GCNs, especially for urban transportation and human flow problems, providing inspiration for predicting multi-city and multi-pollutant atmospheric pollution [29,30].For example, Yanmin Zhu et al. proposed a GCN-DHSTNet model to measure the complicated spatialtemporal dependencies with external factors and predict crowd flow movements.In this research, cities are treated as nodes, city connections as edges, and the characterization factors of atmospheric pollution as node attributes.Traditional GCNs, when dealing with spatiotemporal sequence predictions, often only consider constructing a one-dimensional spatial distance graph to represent the spatial correlations of multi-city and multi-pollutant atmospheric pollutants.They mostly use undirected graphs to reflect the diffusion movement influences of various atmospheric pollutants among cities.Liu Juan et al. used the Rdistance method to establish spatial relationships between graph nodes.Guo Kunpeng et al. used surface distances between different spatial stations to serve as edge weight information in the graph model [31].Zhang Yuhuai et al. used three different graph structure definition methods-geographic distance, Gaussian diffusion model, and matrix training-to construct graph neural network models [32].However, the practical inter-city relationships of atmospheric pollutants contain two-dimensional spatial relationships, including longitude and latitude, and exhibit a certain degree of directionality in pollutant diffusion.A directed graph containing two-dimensional spatial information should describe the spatial correlations among cities, while considering the multi-factor correlations and temporal dependencies of various atmospheric pollutants [33].
To address the issue of inaccurate predictions of multi-pollutant atmospheric concentrations in cities using existing techniques, this invention provides a multi-city and multi-pollutant atmospheric prediction method based on a four-dimensional directed GCN-LSTM (4D-Digraph Convolutional Network-Long and Short-Term Memory) model.This method addresses the issues of traditional GCNs in predicting multi-city atmospheric pollutant concentrations, where the graph models constructed do not consider the two-dimensional spatial relationships between cities and the directionality of atmospheric pollutant diffusion.By constructing a four-dimensional directed graph model that includes two-dimensional spatial directionality information, captures one-dimensional related information among multiple factors within graph nodes, and extracts one-dimensional related information at different temporal moments, this method provides a novel approach for predicting multi-city and multipollutant atmospheric concentrations with spatiotemporal characteristics.

Proposed approach a. Construct the graph of four-dimensional directed GCN model
Based on the geographical location of the city being studied, a four-dimensional directed GCN model graph is established.The graph is a two-dimensional directed graph, where each city is represented as a node with the node's position being the latitude and longitude two-dimensional coordinates of the city center.Directed edges exist between nodes, represented as a planar vector pointing from one node to another.
A two-dimensional space directed graph G = (ν, ε) is defined, n ¼ fv 1 : }city1}; v 2 : }city2}; � � � ; v N : }cityN}g is the node set of the graph, and v i represents the ith city node, i = 1,2,� � �,N.The location of the node is represented by the two-dimensional spatial information of the geographical longitude and dimension of the city center.Each atmospheric pollutant is regarded as a node attribute with a value that changes over time.A two-dimensional directed graph is constructed, and its adjacency matrix, degree matrix, and Laplacian matrix are established.The adjacency matrix describes the information of nodes and edges between them, while the degree matrix represents the sum of weights between nodes and their connected nodes.The time series concentration data of various atmospheric pollutants across multiple cities can be represented as a sequence of information of the two-dimensional directed graph.
Given A two-dimensional spatial directed graph G = (ν, ε), the adjacency matrix A is defined to describe the information of nodes and edges between nodes in the graph.The element A in row i and column j of the adjacency matrix A represents the connection relation between nodes v i and v j .Consideration must be given to the impact of pollutant diffusion from neighboring cities on the concentration of atmospheric pollutants in each city.In other words, the spatial relationships between different city nodes influence the concentration of atmospheric pollutants in the city.To model the edge information between nodes, an adjacency matrix is constructed using the latitude and longitude coordinates of each city, represented as a planar vector.Each city node possesses a unique latitude and longitude coordinate, and the edge relationship between any two cities is represented as a planar vector constructed from the difference between the respective latitude and longitude coordinates.The specific adjacency matrix, denoted as matrix A, is as follows: Where v i and v j represent the latitude and longitude coordinates of city i.
The degree matrix D is defined to describe the sum of weights between nodes and adjacent nodes in the graph G = (ν, ε), which corresponds to the adjacency matrix one by one.The degree matrix D is a diagonal matrix, that is, except the element D ii in row i and column i represents the sum of weights of edges related to node i, the other positional elements are 0, i = 1,2,� � �,N.The degree matrix D is as follows: Then: Define the Laplacian matrix L as the difference between the degree matrix D and the adjacency matrix A. Laplace matrix L is calculated by the following formula: The concept of the two-dimensional directed graph serves as the key to understanding the graph of the four-dimensional directed GCN model since it contains the spatial relation graph, which is a two-dimensional directed graph.The spatio-temporal series of different air contaminants in several cities are created into graph data form, as shown in

b. Spectral decomposition and tensor operation of four-dimensional directed GCN model
In the traditional frequency domain graph convolutional neural network, the condition of frequency domain convolution for the Laplacian matrix L of the graph is that the graph is undirected, that is, the graph's degree matrix D and adjacency matrix A are symmetric matrices.
The spectral decomposition of the four-dimensional directed GCN model is to construct a new adjacency matrix A 0 from the adjacency matrix A of the two-dimensional space directed graph by linear algebraic transformation.The new adjacency matrix A 0 is a symmetric matrix, containing the information of the old degree matrix D and the old adjacency matrix A. According to the new adjacency matrix A 0 renewal matrix D 0 , a new symmetric Laplace matrix L 0 is obtained.Spectral decomposition of the new Laplace matrix L 0 is carried out to obtain the eigenvalue matrix Λ and eigenvector matrix U, which are used as the graph Fourier coefficients and graph Fourier bases respectively, and can carry out graph convolution operation in the spectral domain.
According to the property of antisymmetric matrix, if D is A symmetric matrix and A is an antisymmetric matrix, then the following relation exists: The top script T represents the transpose of the matrix.The new adjacency matrix A 0 = DA−AD composed of A and D is a symmetric matrix, then the update degree matrix D 0 is as follows: Update the Laplacian matrix L 0 as follows: The updated Laplace matrix L 0 is a symmetric matrix and can be decomposed spectral.After spectral decomposition, eigenvalue matrix Λ and eigenvector matrix U are obtained as follows: Where, ðl x i ; l y i Þ represents the ith eigenvalue of matrix L 0 , μ i represents the eigenvector corresponding to the ith eigenvalue, ðm x ij ; m y ij Þ represents the ith element in the jth eigenvector.
The input data of the traditional GCN model is a two-dimensional matrix H, as follows: Where, matrix H is a two-dimensional matrix of N*M, and h ij represents the input data of the jth air pollutant in the ith city to the traditional GCN model For the four-dimensional directed GCN model, the matrices Λ and U are N*N*2 threedimensional matrices, and the problem of tensor dimension mismatch exists when the matrix Λ and U perform tensor operation with matrix H. Therefore, the Numerical Python library broadcasting mechanism is used to extend the dimension of H and copy elements, and H is transformed into the input data of the four-dimensional directed GCN model, as follows: The transformed matrix H is a three-dimensional matrix of N*M*2, and (h NM , h NM ) T represents the broadcasting stretching and transposing of elements h NM in the third dimension.
Transform into a three-dimensional matrix with the same dimension as the eigenvector matrix, and perform tensor operation between the transformed input matrix and the eigenvector matrix to integrate the directional information of two-dimensional space; The transformed H is a three-dimensional matrix, and the formula for tensor operation of dot product with U is as follows: The input matrix is a matrix obtained according to the concentration data of multiple air pollutants in multiple cities.

c. Improved graph filter for four-dimensional directed GCN model
The graph Fourier basis and the graph Fourier coefficient are discovered after the Fourier transform of the graph signal and its conversion to the frequency domain.By modulating the graph filter, the graph Fourier coefficient is altered.Traditional graph filters use the Chebyshev filter, however there will be significant variations near the passband frequency, leading to unstable signal processing.The issue of noise interference in the end portion of the signal is therefore resolved by the present invention, which employs the Hermite filter as a graph filter to filter the converted input matrix.
Define the graph filter polynomial as g θ (Λ), as follows: Where θ κ represents the coefficient of the filter, K represents the graph filter polynomial truncation parameter, used to modulate the graph input data H, Feature transformation is realized by convolution operation between data H and graph filter g θ (Λ).The specific formula is as follows: However, the bases {Λ 0 , Λ 1 , Λ 2 ,� � �,Λ K } of the graph filter polynomial g θ (Λ) are not orthogonal to each other, that is, the coefficients are interdependent, and each part of the basis is easy to interfere with each other during model training.Therefore, Hermite polynomial with orthogonality is used to update the graph filter, and the recursive relation of Hermite polynomial is as follows: Where, k�2, Her 0 (Λ) = Λ 0 , Her 1 (Λ) = 2Λ Substituting Eq (9) into Eq (7), we get: Substitute Eq (10) into Eq (8) and use Hermite polynomial filter for matrix H.The specific calculation is shown in Eq (11): Where, ST represents the output data of Hermite polynomial filter.

d. The four-Dimensional directed GCN-LSTM model is constructed
The concentration data of various atmospheric pollutants across multiple cities not only exhibit spatial correlation, i.e., a two-dimensional directed structure, but also possess temporal and multi-factorial correlation.The four-dimensional directed GCN model with its twodimensional directed graph structure can reveal the spatial correlation between multi-factorial data.However, it is equally crucial to extract information regarding the temporal and multifactorial correlation among the data.Firstly, the calculation formula of the four-dimensional directed GCN model is explained.The concentration matrix of multiple air pollutants in multi-cities outputs the input matrix of the four-dimensional directed GCN model through the LSTM model.The filtered data can be obtained by the input matrix H t through step three-dimensional transformation and step four As shown in Fig 3, the structure and implementation functions of the four-dimensional directed GCN model mainly include: Information is extracted from the longitude and dimension of the two-dimensional directed graph of the four-dimensional GCN model to obtain the adjacency matrix, degree matrix and Laplace matrix.The spectral decomposition of the Laplace matrix is carried out to obtain the eigenvalue matrix and eigenvector matrix, corresponding to the graph Fourier coefficient and the graph Fourier basis respectively.The graph filter modulates the graph Fourier coefficient.The input data H t is filtered, and the filtered data ST t is output ST 0 t through the network layer training of the four-dimensional directed GCN model, and sent into the input gate, oblivion gate and output gate together with the input data H t .Combined with the output i 0 t of the input gate and the output g 0 t of the forgetting gate, the integrated output G t of the short-term memory and long-term memory of the spatial characteristics at time t is obtained.From the output O 0 t and G t of the output gate, the output data Ĥt of the four-dimensional directed GCN model at time t is obtained.
The calculation formula of the four-dimensional directed GCN model is as follows:   The formula for calculating the LSTM structure graph in the four-dimensional directed GCN-LSTM model is as follows: Where

Experimental data and parameter optimization a. Datasets
Taking the data of six major air pollutants, PM2.5, PM10, NO2, SO2, CO, and O3, in the Yangtze River Delta and Beijing-Tianjin-Hebei urban agglomerations as examples, this study predicts these six pollutants.The Yangtze River Delta urban agglomeration includes Suzhou, Nantong, Changzhou, Yixing, Huzhou, Jiaxing, and Shanghai, while the Beijing-Tianjin-Hebei urban agglomeration includes Beijing, Tianjin, Baoding, Datong, Zhangjiakou, Chengde, and Qinhuangdao.Using the 2020 urban meteorological monitoring data as an example, after data screening, a total of seven cities' data were selected, which includes six atmospheric parameters.The time span of the data is from January 1, 2020, to December 31, 2020, a total of 366 days, with data collected every three hours, making it 8 time points per day.The data dimensions are (2928, 7, 6), which represents the length of time points is 2928, there are 7 cities, and 6 atmospheric pollutants.The specific information of the obtained data set is shown in Tables 1 and 2.
For experiments, the first 330 days of data were used for model training, and the remaining days were divided into a 30-day testing period following the 330-day training period.Due to the limited amount of data available in the training set, over-fitting during model training was a concern.Therefore, Monte Carlo Time Series Cross-Validation (MCTS-CV) was applied to the training data, as illustrated in Fig 5 .The approach involves two steps.First, 60% of the time series data from the 330 days were randomly selected as the training set, and 10% were selected as the validation set.No temporal gaps existed between the training and validation sets, and the sample size for the two sets remained constant and contiguous.Second, five iterations were set, in which the starting point for the testing set was randomly selected five times to yield five different sets of training and validation data.The model was trained on each set, which constituted different time-series data, for model evaluation.

b. Canonical correlation analysis
Multiple air pollutant concentration data from different cities were analyzed for correlation, revealing significant correlation between various air pollutants in different cities.Air pollutant concentration data exhibit temporal correlation and correlation not only between different types of pollutants (PM2.5, PM10, NO2, SO2, CO, and O3) but also between different cities, constituting a spatio-temporal sequence of data with multiple factors affecting the correlation.Therefore, this step employs Canonical Correlation Analysis (CCA), a typical method for analyzing and quantifying the correlation between two sets of variables, to measure the strength of the correlation between the predicted values of multiple air pollutants in various cities and the historical values of multiple pollutants in different cities.
t values at the current time of M kinds of air pollutants in N cities and p values at historical times were selected as X group data set.The number of variables in X group is N*M*(p+1).t+1 time values of M kinds of air pollutants in N cities are selected as group Y, and the number of variables in groupY is N*M.The two groups of variables are shown in Formulas ( 25) and ( 26 Where, t, t−1,� � �,t−p represents the current moment t and p historical moments, t+1 represents the next moment, i = 1,2,� � �,N, j = 1,2,� � �,M, Var represents the concentration of atmospheric pollutants, Var ij (t−1) represents the concentration of the jth air pollutant in the ith city at time t−1.The first pair of typical variables x 1 and y 1 can be obtained by linear combination of the two groups of variables X and Y, as shown in Formula ( 27): Using Pearson's correlation coefficient calculation formula, the typical correlation coefficients P 1 of the first pair of typical variables x 1 and y 1 are obtained as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Where, Cov is to obtain covariance and Var is to obtain standard deviation.Formula ( 27) and ( 28) are repeated until step r is reached, and the correlation between X and Y variables is extracted, where r = min(number of variables in X, number of variables in Y).The typical variable of r pair is obtained, and the typical load coefficient and cross load coefficient of r pair typical variable and two groups of variables X and Y are analyzed.The typical-load coefficient represents the simple correlation coefficient of typical variable x r and X group variable, and typical variable y r and Y group variable.The cross load coefficient represents the simple correlation coefficient between the typical variables x r and Y, and between the typical variables y r and X.The larger the absolute value of the load coefficient is, the stronger the correlation between this item and the typical variable is.
When you do correlation analysis, you can look at the typical load coefficient and you can look at the cross load coefficient.For example, among the N*M*(p+1) typical load coefficients generated by x 1 and X, the absolute values of the first and Nth coefficients are the highest; Among N*M typical load coefficients generated by y 1 and Y, the absolute values of the second and Mth coefficients are the highest.So that the first and Nth variables in the X group have a strong correlation with the second and Mth variables in the Y group.
Validate using the data set of Table 1 for the Yangtze River Delta City Cluster., N = 7, M = 6, p = 8, the number of variables in group X is 7×6×8 = 336, and the number of variables in group Y is 7×6 = 42.42 pairs of typical variables are constituted to determine whether there is correlation between the concentration of 6 kinds of air pollutants in 7 cities at the moment to be predicted and the concentration values of 6 kinds of air pollutants in 7 cities at 8 historical moments, that is, the concentration data of multiple air pollutants in multiple cities have temporal and spatial correlation.Table 3 shows the typical correlation coefficient calculated by the test data.
According to the typical load coefficient, the typical result analysis is carried out, and the typical load coefficient is obtained to obtain the correlation between the typical variable and all the variables in this group, that is, the simple correlation coefficient of the typical variable x r and the X group variable as shown in Figs 6 and 7, the simple correlation coefficient of the typical variable y r and the Y group variable as shown in Figs 8 and 9, the cross load coefficient is obtained to obtain the correlation between the typical variable and all the variables in another group, that is, the simple correlation coefficient of the typical variable x r and the Y group variable as shown in Figs 10 and 11, the simple correlation coefficient of the typical variable y r and the X group variable as shown in Figs 12 and 13, and drawn into three-dimensional curves.
As can be seen from the analysis results of the above typical correlation analysis method, the first pair of typical variables x 1 and y 1 and the typical load coefficient and cross load coefficient of the two groups of variables X and Y are greater than 0.6 respectively, indicating a high correlation between the spatial and temporal correlation of the concentration data of various air pollutants in different cities.

c. Forecasting performance evaluation and parameters of network
MAE, MAPE and RMSE were used to predict the model results.Define the predicted value as: ẑ ¼ fẑ 1 ; ẑ2 ; � � � ; ẑn g, and the true value as: MAE (Mean Absolute Error) calculation formula is as follows: MAPE (Mean Absolute Percentage Error) calculation formula is as follows: Table 3.Typical correlation analysis of test data.RMSE (Root Mean Square Error) calculation formula is as follows:

Canonical correlation coefficient
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n The learning rate of the model is set to 0.1, the regularization parameter is set to 0, the Dropout parameter is set to 0.5, and the activation function is set to ReLU.The Epoch is set to 250 times, and the Batch is set to 64.

a. Concrete structure of four-dimensional directed GCN
Firstly, a two-dimensional plane coordinate axis is established according to the longitude and latitude information of seven cities, as shown in Fig 3.
Each city is the node of G: ν = {v 1 ,� � �,v 7 }, edge e of G is represented by an internode vector.City v 1 defines the city v 2 spatial association as edge According to the definition of the Laplace matrix, the Laplace matrix L can be obtained as follows: ð2:69; 2:8Þ ðÀ 0:71; À 0:23Þ ðÀ 0:42; 0:53Þ ðÀ 0:04; 0:81Þ ð0:52; 0:21Þ ð0:55; À 0:16Þ ð0:45; À 0:86Þ In the calculation of graph convolutional in the spectral domain, spectral decomposition of the Laplacian matrix of the graph is needed to obtain the basis of the Fourier transform of the graph.On this premise, it is necessary to ensure that the Laplacian matrix of the graph is a symmetric matrix before spectral decomposition of the matrix.Therefore, by using the properties of symmetric matrix and antisymmetric matrix 3, namely Formula (3), A new form of graph adjacency matrix A 0 is obtained by transforming the original directed graph into a new undirected graph by formula transformation and preserving the spatial direction information of nodes in the directed graph.The adjacency matrix A 0 after formula transformation is as follows: The new adjacency matrix is obtained as a symmetric matrix by transformation, and its degree matrix D 0 is obtained by corresponding update, as follows: Then, according to Formula (4), the Laplace matrix L 0 can be obtained, and the corresponding eigenvalues Λ and eigenvectors U can be calculated by spectral decomposition.

b. Prediction results of 4D-DGCN-LSTM
The constructed 4D-DGCN-LSTM model is used to predict air pollution.The working process of predicting the concentration of various air pollutants in multiple cities is as follows: The constructed 4D-DGCN-LSTM model is used to predict the concentration of atmospheric pollutants at time t+1.Determine the input of the model for {X t−p ,� � �,X t−1 ,X t } is composed of the current time t and p historical time concentration values of M kinds of air pollutants in N cities, which are specifically expressed as: Where, Var ij (t) represents the jth atmospheric pollutant concentration in the ith city at time t, i = 1,2,� � �,N, j = 1,2,� � �,M, the four-dimensional directed GCN-LSTM model is used to predict the atmospheric pollutant concentration at time t+1.Then the output of the 4D-DGCN-LSTM model at time t Ĥt is the predicted concentration value of M kinds of air pollutants in N cities at time t+1, namely: In order to demonstrate, the predictive results of PM2.5 concentration in Suzhou and Beijing cities were presented.The comparisons were made in terms of Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) metrics for four different models including Long Short-Term Memory (LSTM), Graph Convolutional Network (GCN), Graph Convolutional Network combined with LSTM (GCN-LSTM), and 4D Directed Graph Convolutional Network combined with LSTM (4D-DGCN-LSTM), the evaluation indicators of each model are shown in Tables 4 and 5.
As shown in Fig 14, the red solid line represents the predicted value curve of the proposed method (prediction_4D-DGCN-LSTM), while the green solid line represents the true value curve (target), and the remaining lines represent the predicted value curves of the compared methods.
As shown in Fig 15, the red solid line represents the predicted value curve of the proposed method (prediction_4D-DGCN-LSTM), while the green solid line represents the true value

c. Discussions
The purpose of the experiment is to verify that the proposed 4D-DGCN-LSTM model and the four models 4D-DGCN, GCN-LSTM, GCN, and LSTM are the most effective models for predicting urban air pollutants.(Note: The graph structures of GCN and GCN-LSTM in the compared models use one-dimensional undirected distance graphs.)The following conclusions can be drawn from the analysis of the predicted and true value curves of the PM2.5  4. Firstly, when comparing the evaluation indicators of the LSTM and GCN models (using MAE as an example), the MAE value of the LSTM model is 25.43%, and the MAE value of the GCN model is 19.38%, which is a decrease of 6.05% compared to the LSTM model.Considering that LSTM is a classic algorithm for handling time-series prediction problems and comparing the prediction results of GCN models for multiple cities and multiple air pollutants, it shows the importance of applying spatial features to the prediction of air pollutants.The method of converting multiple cities and multiple air pollutants into graph structures to extract spatial features is suitable for predicting air pollutants in multiple cities or multiple stations.
Secondly, when comparing the evaluation indicators of GCN and GCN-LSTM models, the MAE value of the GCN model is 19.38%, and the MAE value of the GCN-LSTM model is 18.67%, which is a decrease of 0.71% compared to the GCN model.It shows that it is necessary to embed the long short-term memory network model when using the graph convolutional neural network model to predict the concentration of multiple air pollutants in multiple cities, which can further improve the model's predictive accuracy.
Thirdly, when comparing the evaluation indicators of GCN and 4D-DGCN models, the MAE value of the GCN model is 19.38%, and the MAE value of the 4D-DGCN model is 14.88%, which is a decrease of 3.41% compared to the GCN model.It shows that if only the distance information between city nodes is used to construct a one-dimensional undirected graph structure, the directional information between cities will be lost when expanding the spatial correlation of multiple cities.However, the 4D-DGCN model uses city longitude and latitude information to construct a two-dimensional directed graph structure, forming a directed graph convolutional neural network model with spatial two-dimensional, time onedimensional, and multi-factor one-dimensional properties.It not only considers the distance between cities but also the directional angle information between them, greatly improving the model's predictive accuracy.When predicting substances with spatial diffusivity like air pollutants, it is necessary to extract spatial features and consider the two-dimensionality and directional properties of the graph structure.
Finally, the MAE value of the proposed 4D-DGCN-LSTM model is 13.76%, which is a decrease of 1.12%, 4.91%, 5.62%, and 11.67% compared to the 4D-DGCN, GCN-LSTM, GCN, and LSTM models, respectively.The results show that the 4D-DGCN-LSTM model achieves good results in predicting multiple air pollutants in multiple cities.

Conclusion
This article focuses on various air pollutants across multiple cities.It analyzes the correlation between air pollutants and factors such as time and space, and proposes a 4D-D GCN-LSTMbased method for predicting air pollutants across multiple cities.The method uses big data to predict concentrations of six air pollutants across cities like Suzhou and Shanghai.Based on the results of this example verification, the article concludes that. . . 1.For predicting concentrations of various air pollutants in multiple cities, extracting spatial features is crucial.In the past, GCN has been used to address such problems by defining the edges of the graph structure using geographic distance, resulting in a one-dimensional, undirected graph structure.This method lacks accuracy in depicting the spatial relationship between nodes in the topology of urban areas, thus ignoring some spatial information.Therefore, it is necessary to develop a directed graph convolutional neural network that includes two-dimensional structures to better represent the spatial information.
2. To capture the two-dimensional directedness of air pollutants in spatial space, this study employs a two-dimensional directed graph to topologically represent the spatial correlation of air pollutants among multiple cities and proposes a 4D-DGCN model.To address the inability to directly decompose and convolve the spectral domain in a directed graph and tensor calculation, a corresponding two-dimensional directed graph spectral decomposition and tensor calculation algorithm is presented.This algorithm enables spectral domain graph convolutional on the two-dimensional directed graph, providing more spatial correlation information for air pollutant concentration prediction.Based on this, the 4D-DGCN model is embedded in the LSTM model, creating the 4D-DGCN-LSTM model that unifies multiple-factor correlation, time correlation, and spatial correlation of air pollutants for modeling.

Fig 2 .
Fig 2. Diagram of a four-dimensional digraph.https://doi.org/10.1371/journal.pone.0287781.g002 filtering at each time point.Set the input data at time t after filtering as ST t .The output data ST t of Hermite polynomial filter was passed into the four-dimensional directed GCN model network layer for weight and bias training.The network structure of the four-dimensional directed GCN model is shown in Fig 3.

Fig 3 .
Fig 3. Structure diagram of four-dimensional directed GCN model.https://doi.org/10.1371/journal.pone.0287781.g003 t represents the input data of the four-dimensional directed GCN model at time t, ST t represents the output data of Hermite polynomial filter at time t, W G , b G represent the weight and bias of convolution operations in the four-dimensional directed GCN model, respectively.ST 0 t represents the four-dimensional directed GCN model network layer training output at time t; W 0 i ;b 0 i represent the weight and bias of input gates in the 4D directed GCN model, respectively.i 0 t represents the output of the input gate in the four-dimensional directed GCN model at time t; W 0 g ;b 0 g represent the weight and bias of forgetting gate in 4D directed GCN model, respectively.g 0 t represents the output of the forgetting gate in the four-dimensional directed GCN model at time t; W 0 o ;b 0 o represent the weight and bias of the output gate in the 4D directed GCN model, respectively.O 0 t represents the output of the output gate in the fourdimensional directed GCN model at time t, G t−1 represents the output of the integration of long short-term memory of the spatial features at time t−1, G t represents the integrated output of short-term memory and long-term memory of spatial features at time t, and Ĥt represents the output data of the four-dimensional directed GCN model at time t.Then, the long short-term memory cycle structure of the LSTM is used to extract the time sequence features of various factors at various points after the four-dimensional directed GCN model has been integrated into the LSTM network architecture.In order to describe both the spatial and temporal correlation of the multi-factor data simultaneously, the spatial aspects of the graph data are extracted while the time sequence relations of the multi-factor data are preserved.Fig 4 illustrates the precise organization of the four-dimensional directed GCN-LSTM model that simultaneously collects multi-factor temporal and spatial information.

Fig 14 .Fig 15 .
Fig 14.Curve of predicted and true PM2.5 concentration in Suzhou by five methods.https://doi.org/10.1371/journal.pone.0287781.g014 �g is the set of edges between nodes,e ij : v i v j �! for node v i point node v j plane vector, i,j = 1,2,� � �,N,as shown in Fig 1. Suppose the longitude and latitude coordinates of node v i and v j are (x vi , y vi ) and (x vj , y vj ) respectively, then e ij : v i v j �! ¼ ðx vj À x vi ; y vj À y vi Þ.There are directed edges between all nodes.
ĤtÀ 1 respectively represents the output of the four-dimensional directed GCN-LSTM model at time t−1, X t represents the concentration matrix of atmospheric pollutants at time t, f t represents the output of forgetting gate in LSTM network at time t, W f , b f represent the weight and bias of forgetting gate in LSTM network respectively.i t represents input gate output in LSTM network at time t, W i 和b i represent weight and bias of input gate and forgetting gate respectively in LSTM network.g t represents an intermediate write information, W g 和b g represent the weight and bias of the intermediate write information network respectively, O t represents output gate output in LSTM network at time t, W o , b o represent weight and bias of output gate in LSTM network respectively, C t−1 and C t represent the t−1 time state and t time state of a memory unit in LSTM network, respectively.Sigmoid and tanh are two activation functions used in model network training, respectively.

Table 5 . Evaluation metrics for various models using the data set of Yangtze River Delta City Cluster.
https://doi.org/10.1371/journal.pone.0287781.t005curve (target), and the remaining lines represent the predicted value curves of the compared methods.